NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

Zhu, Zifeng; Jia, Mengzhao; Zhang, Zhihan; Li, Lang; Jiang, Meng (April 2025, Association for Computational Linguistics)

Multimodal Large Language Models (MLLMs) have demonstrated impressive abilities across various tasks, including visual question answering and chart comprehension, yet existing benchmarks for chart-related tasks fall short in capturing the complexity of real-world multi-chart scenarios. Current benchmarks primarily focus on single-chart tasks, neglecting the multi-hop reasoning required to extract and integrate information from multiple charts, which is essential in practical applications. To fill this gap, we introduce MultiChartQA, a benchmark that evaluates MLLMs’ capabilities in four key areas: direct question answering, parallel question answering, comparative reasoning, and sequential reasoning. Our evaluation of a wide range of MLLMs reveals significant performance gaps compared to humans. These results highlight the challenges in multi-chart comprehension and the potential of MultiChartQA to drive advancements in this field. Our code and data are available at https://github.com/Zivenzhu/Multi-chart-QA.
more » « less
Full Text Available
Generalized Cycle Benchmarking Algorithm for Characterizing Midcircuit Measurements

https://doi.org/10.1103/PRXQuantum.6.010310

Zhang, Zhihan; Chen, Senrui; Liu, Yunchao; Jiang, Liang (January 2025, PRX Quantum)

Midcircuit measurements (MCMs) are crucial ingredients in the development of fault-tolerant quantum computation. While there have been rapid experimental progresses in realizing MCMs, a systematic method for characterizing noisy MCMs is still under exploration. In this work, we develop a cycle benchmarking (CB)-type algorithm to characterize noisy MCMs. The key idea is to use a joint Fourier transform on the classical and quantum registers and then estimate parameters in the Fourier space, analogous to Pauli fidelities used in CB-type algorithms for characterizing the Pauli-noise channel of Clifford gates. Furthermore, we develop a theory of the noise learnability of MCMs, which determines what information can be learned about the noise model (in the presence of state preparation and terminating measurement noise) and what cannot, which shows that all learnable information can be learned using our algorithm. As an application, we show how to use the learned information to test the independence between measurement noise and state-preparation noise in an MCM. Finally, we conduct numerical simulations to illustrate the practical applicability of the algorithm. Similar to other CB-type algorithms, we expect the algorithm to provide a useful toolkit that is of experimental interest. Published by the American Physical Society2025
more » « less
Full Text Available
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems

https://doi.org/10.18653/v1/2025.naacl-long.566

Zhu, Zifeng; Jia, Mengzhao; Zhang, Zhihan; Li, Lang; Jiang, Meng (January 2025, Association for Computational Linguistics)

Full Text Available
Flexible and Stretchable Vitrimers for Sustainable Electronics

https://doi.org/10.1021/acsami.4c16995

Biswal, Agni K; Hong, Peter; Zhang, Zhihan; Zheng, Yiwen; Gupta, Surabhit; Nepal, Dhriti; Iyer, Vikram; Vashisth, Aniruddh (February 2025, ACS Applied Materials & Interfaces)

Full Text Available
Living Sustainability: In-Context Interactive Environmental Impact Communication

https://doi.org/10.1145/3749488

Zhang, Zhihan; Thavikulwat, Puvarin; Metzger, Alexander Le; Mei, Yuxuan; Hähnlein, Felix; Englhardt, Zachary; Abowd, Gregory D; Patel, Shwetak; Schulz, Adriana; Cheng, Tingyu; et al (September 2025, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

Climate change demands urgent action, yet understanding the environmental impact (EI) of everyday objects and activities remains challenging for the general public. While Life Cycle Assessment (LCA) offers a comprehensive framework for EI analysis, its traditional implementation requires extensive domain expertise, structured input data, and significant time investment, creating barriers for non-experts seeking real-time sustainability insights. Here we present the first autonomous sustainability assessment tool that bridges this gap by transforming unstructured natural language descriptions into in-context, interactive EI visualizations. Our approach combines language modeling and AI agents, and achieves >97% accuracy in transforming natural language into a data abstraction designed for simplified LCA modeling. The system employs a non-parametric datastore to integrate proprietary LCA databases while maintaining data source attribution and allowing personalized source management. We demonstrate through case studies that our system achieves results within 11% of traditional full LCA, while accelerating from hours of expert time to real-time. We conducted a formative elicitation study (N=6) to inform the design objectives of such EI communication augmentation tools. We implemented and deployed the tool as a Chromium browser extension and further evaluated it through a user study (N=12). This work represents a significant step toward democratizing access to environmental impact information for the general public with zero LCA expertise.
more » « less
Full Text Available
IHEval: Evaluating Language Models on Following the Instruction Hierarchy

Zhang, Zhihan; Li, Shiyang; Zhang, Zixuan; Liu, Xin; Jiang, Haoming; Tang, Xianfeng; Gao, Yifan; Li, Zheng; Wang, Haodong; Tan, Zhaoxuan; et al (April 2025, Association for Computational Linguistics)
Chiruzzo, Luis; Ritter, Alan; Wang, Lu (Ed.)
The instruction hierarchy, which establishes a priority order from system messages to user messages, conversation history, and tool outputs, is essential for ensuring consistent and safe behavior in language models (LMs). Despite its importance, this topic receives limited attention, and there is a lack of comprehensive benchmarks for evaluating models’ ability to follow the instruction hierarchy. We bridge this gap by introducing IHEval, a novel benchmark comprising 3,538 examples across nine tasks, covering cases where instructions in different priorities either align or conflict. Our evaluation of popular LMs highlights their struggle to recognize instruction priorities. All evaluated models experience a sharp performance decline when facing conflicting instructions, compared to their original instruction-following performance. Moreover, the most competitive open-source model only achieves 48% accuracy in resolving such conflicts. Our results underscore the need for targeted optimization in the future development of LMs.
more » « less
Full Text Available
Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples

https://doi.org/10.1145/3627673.3679608

Zeng, Qingkai; Bai, Yuyang; Tan, Zhaoxuan; Feng, Shangbin; Liang, Zhenwen; Zhang, Zhihan; Jiang, Meng (October 2024, ACM)

Full Text Available
Incorporating Sustainability in Electronics Design: Obstacles and Opportunities

https://doi.org/10.1145/3706598.3713299

Englhardt, Zachary; Hähnlein, Felix; Mei, Yuxuan; Lin, Tong; Sun, Connor Masahiro; Zhang, Zhihan; Patel, Shwetak; Schulz, Adriana; Iyer, Vikram (April 2025, ACM)

Full Text Available
Aligning Large Language Models with Implicit Preferences from User-Generated Content

https://doi.org/10.18653/v1/2025.acl-long.384

Tan, Zhaoxuan; Li, Zheng; Liu, Tianyi; Wang, Haodong; Yun, Hyokun; Zeng, Ming; Chen, Pei; Zhang, Zhihan; Gao, Yifan; Wang, Ruijie; et al (January 2025, Association for Computational Linguistics)

Full Text Available
PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning

Zhang, Zhihan; Lee, Dong-Ho; Fang, Yuwei; Yu, Wenhao; Jia, Mengzhao; Jiang, Meng; Barbieri, Francesco (August 2024, Proceedings of the 62th Annual Meeting of the Association for Computational Linguistics)

Instruction tuning has remarkably advanced large language models (LLMs) in understand- ing and responding to diverse human instruc- tions. Despite the success in high-resource lan- guages, its application in lower-resource ones faces challenges due to the imbalanced foun- dational abilities of LLMs across different lan- guages, stemming from the uneven language distribution in their pre-training data. To tackle this issue, we propose pivot language guided generation (PLUG), an approach that utilizes a high-resource language, primarily English, as the pivot to enhance instruction tuning in lower-resource languages. It trains the model to first process instructions in the pivot language, and then produce responses in the target lan- guage. To evaluate our approach, we introduce a benchmark, X-AlpacaEval, of instructions in 4 languages (Chinese, Korean, Italian, and Spanish), each annotated by professional trans- lators. Our approach demonstrates a significant improvement in the instruction-following abili- ties of LLMs by 29% on average, compared to directly responding in the target language alone. Further experiments validate the versatility of our approach by employing alternative pivot languages beyond English to assist languages where LLMs exhibit lower proficiency.
more » « less
Full Text Available

« Prev Next »

Search for: All records